Skip to content

Enable graceful shutdown of servers #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

stephentoub
Copy link
Contributor

Fixes #10
Fixes #117

I'm not in love with where this landed, but I'm expecting it'll evolve further with @halter73's work, and it's an improvement over where things were.

I cleaned up a few other things along the way.

@halter73
Copy link
Contributor

I know the tests have been flaky, but three timeouts seem worse than average.

I downloaded testresults-windows-latest-Debug.zip since I saw that SseClientTransportTests.ConnectAsync_Should_Connect_Successfully hung in that run which seems to be the most common culprit even in other PRs, but I couldn't find any rooted instances of SseClientTransport which I found odd. I wonder if we're awaiting a Task that gets GC'd without ever completing somewhere. Do you have any idea what might be going on?

While looking at that, I also saw a bunch of unrooted SseClientTransports that appear to be from passed SseServerIntegrationTests that had faulted _receiveTasks due to trying to write to the ITestOutputHelper after the test completed but were still in the IsConnected state. At this point I realized that the IClientTransport created by McpClientFactory.Create() never gets disposed unless the client fails to connect. There's no good way for the caller to dispose the transport. The caller could provide their own createTransportFunc and storing the return value, but they shouldn't have to.

Of course, it shouldn't be the responsibility of McpClient to dispose one of its constructor parameters. Maybe it'd be more acceptable given a leaveOpen: false argument, but I think the better solution is for the McpClient to take an IClientTransportFactory (or maybe IClientConnectionFactory or IClientSessionFactory) so it controls the creation of the transport/connection/session. I'll try to get a PR with that open tomorrow rebased on top of this work.

I think what is here is good so long as the increased test flakiness is an aberration.

Copy link
Contributor

@halter73 halter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving mostly so I can continue my work without too much more rebasing.

/// </summary>
Task StartAsync(CancellationToken cancellationToken = default);
Task RunAsync(Func<CancellationToken, Task>? onInitialized = null, CancellationToken cancellationToken = default);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of the onInitalized callback. I think the far more ergonomic pattern is the one used by generic host where you can do something like this:

await server.StartAsync();
// Code to run after initialization
await server.WaitForShutdownAsync();

RunAsync in then built on top of that. It might be different if there was something that needed to run after onInitialized, but that doesn't appear to be the case. I can change this in a follow up though.

// Wait for transport completion.
if (serverTransport is not null)
{
await serverTransport.Completion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should RunAsync wait for the MessageProcessingTask to complete too? It's good to know that when RunAsync exits gracefully you can dispose your logging provider and not truncate.

At least CleanupAsync still awaits the MessageProcessingTask, but RunAsync doesn't call that unless there's an exception. If it's waiting for transport completion, it should probably now always call CleanupAsync in a finally instead.

I plan to remove IServerTransport.Completion in my follow up, but we should be able to get the same effect by separating the listener and connection/session interfaces and disposing both at the appropriate times. That is during the disposal of the IMcpServer and after the completion of the MessageProcessingTask respectively.

response.Headers.ContentType = "text/event-stream";
response.Headers.CacheControl = "no-cache";

await using var localTransport = transport = new SseResponseStreamTransport(response.Body);
await using var server = McpServerFactory.Create(transport, mcpServerOptions.Value, loggerFactory, endpoints.ServiceProvider);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to call IMcpServer.RunAsync() somewhere for this sample to continue to work? You can test it out using npx @modelcontextprotocol/inspector

@stephentoub
Copy link
Contributor Author

Approving mostly so I can continue my work without too much more rebasing.

Do you want to just take this over, incorporating it into your change and changing it to your liking?

return Task.CompletedTask;
await _shutdownTokenSource.CancelAsync().ConfigureAwait(false);
_listener.Stop();
await _listeningTask.ConfigureAwait(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also await Completed?

@halter73
Copy link
Contributor

Do you want to just take this over, incorporating it into your change and changing it to your liking?

Sure.

ServerInstructions = initializeResponse.Instructions;

// Validate protocol version
if (initializeResponse.ProtocolVersion != _options.ProtocolVersion)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR - but we need to figure out what "supports" means exactly in https://spec.modelcontextprotocol.io/specification/2025-03-26/basic/lifecycle/

We might risk disconnecting from servers we are compatible with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you ensure we have an open issue for that? Thanks.

var logLevelIndex = random.Next(loggingLevels.Count);
var logLevel = loggingLevels[logLevelIndex];
await server.SendMessageAsync(new JsonRpcNotification()
// Send random log messages every few seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is value in replicating the everything servers behavior in the TestServer (in this case the 15 second interval). Given it's the official reference server, it allows us to reuse test code. I've found and reported several bugs in the reference server because of this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15 seconds is way too long for tests, though. It makes these test cases take upwards of 30 seconds, when they could otherwise finish almost immediately.

@stephentoub
Copy link
Contributor Author

@halter73, I'm going to close this, since you've picked up the commits and incorporated it into your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants